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Abstract 

This paper addresses the task of time separated 
aerial image registration. The ability to solve this 
problem accurately and reliably is important for a 
variety of subsequent image understanding appli¬ 
cations. The principal challenge lies in the extent 
and nature of transient appearance variation that a 
land area can undergo, such as that caused by the 
change in illumination conditions, seasonal vari¬ 
ations, or the occlusion by non-persistent objects 
(people, cars). Our work introduces several nov¬ 
elties: (i) unlike all previous work on aerial im¬ 
age registration, we approach the problem using a 
set-based paradigm; (ii) we show how local, pair¬ 
wise constraints can be used to enforce a globally 
good registration using a constraints graph struc¬ 
ture; (iii) we show how a simple holistic represen¬ 
tation derived from raw aerial images can be used 
as a basic building block of the constraints graph in 
a manner which achieves both high registration ac¬ 
curacy and speed. We demonstrate: (i) that the pro¬ 
posed method outperforms the state-of-the-art for 
pair-wise registration already, achieving greater ac¬ 
curacy and reliability, while at the same time reduc¬ 
ing the computational cost of the task; and (ii) that 
the increase in the number of available images in 
a set consistently reduces the average registration 
error. 


Luo and Li, 201 1| . Unlike previous work on aerial images, 
we formulate and address the registration problem using an 
image set-based framework, rather than a sequence of inde¬ 
pendent pair-wise registrations. 
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Figure 1: Input images (for easier visualization, the patches shown 
include only approximately 10% of the area used as actual input). 
These correspond to approximately the same land area imaged on 
different days and different times of the day, and registered using 
a state-of-the-art commercial registration system which uses both 
GPS and image data. Substantial registration errors remain (e.g. the 
misalignment between (d) and (e) is 78 pixels). 


1 Introduction 

The goal of the present work is to achieve accurate reg¬ 
istration of time separated aerial images. The key chal¬ 
lenge of this task emerges as a consequence of the po¬ 
tentially large transient appearance changes, such as those 
which may be caused by different illumination conditions, 
seasonal variations, or mobile objects with a non-permanent 
presence (e.g. people, cars, lawnmowers). Some of these 
challenges are illustrated in Fig using a sample of im¬ 
ages taken from our evaluation data set. Reliable registra¬ 
tion is an important pre-processing step required by a wide 
range of pr actical applications includ ing semantic labelling 
of images i Mnih and Hinton, 2012| , and the detection of 
meaningful (high-level), structural changes |Chhabra, 2009 


Registration, as a general problem of geometric normaliza¬ 
tion, is pervasive in computer vision. Unsurprisingly, the cor¬ 
pus of relevant previous work is ri ch and varied, often with a 
high degree of domain specificity |Zitova and Flusser, 2003| . 
In aerial imaging applications, most registration approaches 
described in the literature typically focus on man-made struc- 
tures, a priori choosing t o exploit the presence o f line features 
IWong and Clausi, 2007| , r ectangular buildings |Noronha and 
Nevatia, 200 1| |, or roads iMnih and Hinton, 20101 . All of 
these methods register images in a pair-wise manner, either 
aligning an aerial image to an aerial image, or an aerial image 
to a map. No previous work on aerial image registration op¬ 
erates directly on image sets as input, nor is readily extended 
to this problem setup. 

To place the present work in broader context and better ap- 





















preciate our contributions, it is worth noting that set-based 
registration methods have been described in other applica¬ 
tion domains of computer vision , most notably in t he field 
of medical image understanding I Metz et al, 201 1| . How¬ 
ever, an examination of these approaches shows that they 
too cannot be readily applied on the problem we consider in 
this paper. Indeed, medical images are often taken in cali¬ 
brated conditions, consistent across acquisitions, while aerial 
images exhibit extreme variations due to uncontrollable il¬ 
lumination and seasonal effects, amongst others. For exam¬ 
ple, the set-based method base d on Havrda-Carva t cumulative 
residual entropy proposed in |Chen et al, 2010| requires the 
shapes of objects of inte rest to be known in adva nce. The 
appro aches described in [Lord et loo?) and et al.\ 


1995 1 suffer from a similar limitation in this context, given 


that they require reliable contour informati on. Their target 


domain being void of such challenges, in IWachinger and 


Navab, 2012| the authors do not consider the issues of illu¬ 


mination change or the potential presence of transient struc 
tures and occlusions. Si milar observations hold for a num 
her of oth er approaches |Thevenaz et al, 1998t Foroosh et 


al, 20021 . A major recent development draws from the 


advances in sparsity learning, often with the specific focus 
on the alignment of images of faces |Wagner et al, 2012 


Ghosh and Manjunath, 201 3| . Because of their computational 


cost, these methods are limited in their application to images 
prohibitively small for our problem. In actual practice, the 
standard registration procedure used by geographers involves 
the manual selection of a set of placemarks, fo llowed by the 


applic ation of a bundle adjustment algorithm jTriggs et al, 
20Q0| . This is not only a laborious process but also one which 


requires considerable expertize and experience because the 
number and the choice of placemark locations is highly scene 
specific yet crucial for the success of the overall scheme. 

Our work introduces several major novelties of signifi¬ 
cance. Firstly, we show how a simple quasi-invariant rep¬ 
resentation derived from raw aerial images, employed in a 
coarse-to-fine fashion with a suitable matching function can 
be used to improve the accuracy and reliability of registration 
at the level of pair-wise image alignment already. Secondly, 
we describe how this approach can be employed as a part 
of a novel set-based registration framework. The proposed 
framework is built upon what we term a constraints graph, 
which is used to propagate local, pair-wise registration infor¬ 
mation across the image set. We demonstrate that the result¬ 
ing method outperforms the current state-of-the-art in aerial 
image registration both in terms of accuracy and reliability, 
as well as speed. 


2 Proposed approach 

In this section we describe the key technical contributions of 
the present paper. We start with an overview of the proposed 
approach and its key constituent elements, and follow with a 
detailed description of each. 

2.1 Overview 

At the centre of the proposed approach is an optimization 
scheme. While the optimization scheme itself is global (i.e. 


it operates jointly over the entire image set), globality is 
achieved through the propagation of local registration con¬ 
straints. The aim is to co-register an entire image set by 
finding the best compromise between different pair-wise reg¬ 
istrations. Both for the sake of computational efficiency as 
well as reasons inherently linked to the nature of the problem 
under consideration (see Sec |2.2| for detail), only some pair¬ 
wise registrations contribute to the objective (fitness) function 
which is maximized. Like most previous work, we constrain 
our consideration to translation only. 

2.2 Constraints graph 

The method we propose in this paper is founded on two key 
ideas. The first of these is that the registration of an en¬ 
tire set of images can be assessed and should be formulated 
as a function of pair-wise registrations. The second idea 
and premise, is that the magnitude and nature of confound¬ 
ing variation present in realistic images of approximately the 
same geographical location acquired make pair-wise registra¬ 
tion difficult and unreliable. Thus, our method aims to find 
the best solution (relative registration adjustment) which bal¬ 
ances different pair-wise assessments of registration quality. 
Formally, our registration can be written as an optimization 
problem which comprises the maximization of the following 
fitness function (we use the term “fitness function” to empha¬ 
size that the desired solution maximizes its value, rather than 
“objective function” which is less specific and is generally 
used when minimization is sought): 

n n 

J({Arfc};{7fc}) = x p{Ari - Ar^;C(/i),C(^j))}- 

i=l j = l 

( 1 ) 

Here, J{{Ark}; {Ik}) is the value of the fitness function for 
the set of registration adjustments [Avk] relative to a specific 
reference image (the choice of the reference image does not 
affect the result so we consistently select the first image of 
the set, i.e. Ji, as the reference image) for the input image 
set {Ik} = {/i, ..., In}, C(^i) quasi-invariant represen¬ 
tation of the view captured by image li, p( Ar^ — Avj ‘X 1 X 2 ) 
a measure of pair-wise registration agreement between quasi¬ 
invariant representations and (2 geometrically adjusted by 
Avi — Avj, and Wij a binary weight (valued 0 or 1) which in¬ 
cludes or excludes the contribution of the corresponding “lo¬ 
cal” i.e. pair-wise registration to the global criterion. 

The design of the local elements in Q - specifically the 
quasi-invariant image representation and the corresponding 
distance measure - is addressed in Sec |2.3| Presently we 
focus on the global issues, that is, the problem of selecting 
which weights Wij should vanish and which should assume 
a unitary value. We think of this process as the construction 
of a constraints graph G = {V,E) whose vertices (nodes) 
correspond to input images V = {/i, /2,..., In} and whose 
edges E = • • •, '^n,n-i5 '^n,n} encode the set of 

local constraints which contribute to the set-based registration 
fitness function. 

Considering that we assume that the extent and the nature 
of appearance changes across input images present a major 
challenge to pair-wise registration, it is a premise inherent in 
our approach that the additional constraints and information 








































should come from the structure of the graph G. Thus the cen¬ 
tral question becomes what the topology of this graph should 
be to ensure that additional information is indeed extracted. 

Qualitatively speaking, we identify two types of good reg¬ 
istration reinforcement that our constraints graph can effect. 
The first of these concerns similar input images and thus acts 
in a proximal fashion in the image space. The intuition is that 
similar images, i.e. images which are close in the input image 
space in the Euclidean distance sense, correspond to the scene 
imaged in similar conditions (e.g. few differences due to tran¬ 
sient objects in the images, similar illumination conditions 
etc). By virtue of this observation, such images should be 
easier to register in a pair-wise manner. Consequently, there 
should be a connection between the corresponding nodes in 
the constraints group so as to ensure that the global registra¬ 
tion is built upon such reliable pair-wise registrations. This 
allows reliable registrations to propagate their initially local 
constraints across the graph, achieving global effect. The sec¬ 
ond type of constraint we identify acts in a rather opposite 
manner from the previously described proximal constraint, in 
that it seeks to connect images which are distant from oth¬ 
ers. Intuitively speaking, these are images which have been 
acquired in conditions very much unlike any of the other im¬ 
ages (more generally, we can talk about cliques of distant im¬ 
ages which are separated from the rest of the data). The rea¬ 
son why good connectivity of the graph nodes corresponding 
to these images is desirable stems from the observation that 
these images cannot be included in the set registration scheme 
by means of concatenated reliable pair-wise registrations - 
some other means of meaningful polling of information from 
the rest of the graph is needed. The premise behind the idea 
that distant images (or indeed cliques of such images) should 
be richly connected to the rest of the graph is that pair-wise 
appearance differences corresponding to their connection are 
likely to be approximately uncorrelated. By including a rich 
set of connections, the effect of changeable elements of the 
scene is outweighed by the persistent and reliable structures 
which remain stable across different connections. 

Based on these two key ideas, we investigated four dif¬ 
ferent elementary blocks - building schemes - used to con¬ 
struct the constraints graph. Two of them are used to estab¬ 
lish proximal connections, while the other two are their distal 
analogues: 

• scheme 1: connections local in the Euclidean sense: 


Wij 


0 : i = j W Ij not one of k nearest neighbours of li 
1 : Ij is one of k nearest neighbours of U 

( 2 ) 


• scheme 2: connections local in the Euclidean sense: 


Wij 


0 I i — j V d(^Ii, Ij^ > dfhresl 
1 : i ^ j A d{Ii, Ij) < dthresl 


( 3 ) 


• scheme 3: connections distal in the Euclidean sense: 


Wij 


0 : i — j y Ij not one of k furthest images from li 
1 : Ij is one of k furthest neighbours of U 

( 4 ) 





(a) 1: /c-nearest n/bours 




(b) 2: thresh, proximity 



Eigure 2: Illustration of four schemes for constructing the local 
constraints graph over a set of images. Images (blue circles) are con¬ 
ceptually shown projected onto the 2D principal component space. 


• scheme 4: connections distal in the Euclidean sense: 


Wij — 


0 I di^Ii^Ij) dtlires2 
1 I di^Ii^Ij) ^ dtlires2 


(5) 


where the distance between two images is in all cases mea¬ 
sured in the original Euclidean space: 



( 6 ) 


Notice that schemes 2 and 4 result in symmetric edges (i.e. 
Wij = 1 Wj^i = 1) while in general this is not the case 
for schemes 1 and 3. The four schemes are illustrated and 
compared conceptually in Eig|^ 

We obtained the best results by combining two schemes, 
one proximal and one distal. The choice of the specific 
schemes was not found to affect the results significantly, and 
henceforth we adopt the combined use of schemes 1 and 3. 
An example of a graph built in this fashion is shown in Eig|^ 


2.3 Local constraint: pair-wise registration quality 

In the previous section we described how in our method ‘lo¬ 
cal’, pair-wise registration information is propagated glob¬ 
ally, that is, across the entire set of images being registered. 
The aim was to integrate the available information in a mean¬ 
ingful manner which leads to a globally good solution by 
virtue of local constraints. Erom this it is clear that ultimately 
the elementary building block of the scheme and the poten¬ 
tial bottleneck is to be found in the way registration between 
a pair of images is assessed - while the extent of the chal¬ 
lenge posed by large appearance changes makes it unrealistic 
to expect a highly accurate result when only pairs of images 
are used, it is crucial that pair-wise registration is sufficiently 
powerful to drive the constraints graph optimization. 

Our initial experiments with a variety of interest point de¬ 
tectors and local feature descriptors suggested that the extent 
of local appearance changes in our data is so substantial that 




Original image 



Figure 3: A constraints graph is used to propagate globally reg¬ 
istration information from pair-wise image comparisons. Its nodes 
correspond to input images (here shown projected to their 3D lin¬ 
ear principal subspace), while the connections between them encode 
which pair-wise comparisons contribute to the fitness function used 
to quantify the quality of registration on the level of the entire set. 


very few reliable k eypoint matches could be made |Arand 
jelovic et al., 2015| . Furthermore, we found that keypoints- 
based approaches did not readily lend themselves to an effi¬ 
cient integration in our constraints graph framework. Thus, 
we developed a holistic approach instead. Our approach on 
this, pair-wise level consists of two steps. Firstly, input im¬ 
ages are processed using simple filters to produce a quasi¬ 
illumination invariant representation (the effects of illumina¬ 
tion are readily recognized as effecting the most substantial 
changes between different images). The assessment of the 
registration quality for a particular translation is then quanti¬ 
fied using the simple normalized cross-correlation coefficient. 

Considering that illumination changes effect the most sub¬ 
stantial appearance changes that we wish our representation 
to be invariant to, we focused our attention to various fil¬ 
ters which preserve edge-like, high frequency information 
content in images. We experimented with high-pass filter s 
lArandielovic and Cipoll a, 2006[[Gangkofner 200^ 

quotie nt representations jArandielovic, 2 0U9[ [Arandielovlc, 
|2013|, distance transformed edge maps |Liu aL\ 2010 
lArandi elovic, 20 12b[ l Arandielovic and Cipolla, 2U13I, and 
others lArandjelovIc^ 2U12c|, wTtiniimted success. Ihe rep- 
resentation that we found effective, and therefore which we 
adopt henceforth, is the absolute value of the high-pass fil¬ 
tered image: 


G=C(/z)= /z-{/z*C((7)}L 


(7) 


where G(cr) is the isotropic 2D Gaussian kernel with the 
standard deviation ‘width’ parameter cr, and * denotes 2D 
convolution. The quality of registration agreement between 
two such quasi-illumination invariant images and , geo¬ 
metrically transformed for the specified registration parame¬ 
ters, is then quantified using the normalized cross-correlation 
coefficient p(Arij; Ci, Ci)- 


p(Arij ; CiiCj ) 


ErOW Cj(r +Anj) 


( 8 ) 


While the use of a high-pass filter ensures that the most sig¬ 
nificant responses occur around edge-like structures, taking 
its absolute value achieves invariance to the sign of the corre¬ 
sponding gradients, i.e. bright-dark vs. dark-bright interfaces. 
We will refer to this representation as ABS-HP. An example 
is shown in Fig|^ 


High-pass filtered, 

High-pass filtered absolute value 



Figure 4: Image patch 
extracted from a raw in¬ 
put image (left), after 
high-pass filtering (cen¬ 
tre), and after taking 
the absolute value of 
the high-pass filter output 
(right). 


Computational considerations 

The computation of the normalized cross-correlation coeffi- 
dent p{Arij;CiXj) in B can be extremely slow - if imple¬ 
mented ‘naively’ in the image domain the number of compu¬ 
tations is approximately 4 x w x h, where w and h are the 
width and the height of an image in pixels. This is a poten¬ 
tial bottleneck, as the value of the coefficient is needed for a 
different in each iteration of the maximization of the 

fitness function in Q- It is an attractive feature of the pro¬ 
posed framework that it lends itself to an efficient solution of 
this problem. Firstly, we employ the well-known fast Fourier 
transform-based pre-co mputation of the full cross-c orrelation 
matrix p(Arij; Ci, Ci) I Reddy and Chatterji, I996|. Briefly, 
the cross-correlation between images G and ^, denned as: 

p(Arij] Q, Q) = ^ ^ — (0 * Cj) (Ari,j), 

r 

can be computed efficiently in the Fourier domain by exploit¬ 
ing the convolution theorem: 

piAnjXi, 0) = mCi)* • HCj)} (9) 

where T is the Fourier transform operator, • point-wise mul¬ 
tiplication, and (...)* complex conjugation. This solves the 
problem of computing the numerator in The computation 
is performed once and need not be repeated in each iteration, 
but rather the corresponding value looked-up. However, it is 
crucial that this value is normalized by the denominator in ([^; 
the reason for this is that for different registration adjustments 
Arij, different parts of two images overlap. The omission of 
normalization could lead to an unfair bias towards large shifts 
because small overlapping image patches on average tend to 
look more alike. Thus, we pre-compute t he normalization 
value s using the integral image technique | Viola and Jones, 
200I| . Specifically, we compute integral images for Q • Q 
and (j • (j, which allows us quickly to obtain the values of 
all possible denominators in ^ in at most three elementary 
operations per denominator (since the overlapped area of an 
image always extends from one of its corners). 

2.4 Fitness function maximization 

Having pre-computed the full normalized cross-correlation 
matrix of the filtered images which are being registered, the 
fitness function introduced in ([T]) can be readily maximized 
using the steepest ascent method; we initialize the process by 
setting Vi. Ar^ = 0. The final issue we address here con¬ 
cerns the difficulty posed by what can be described as limited 
spatial influence of characteristic features extracted by our 
ABS-HP representation. Consider what happens when the 
initial misalignment between two images is large. Because 
our representation is based on a high-pass filter, the maxi¬ 
mal filter responses are observed around edge-like structures. 






















































Since these structures are narrow, their responses do not have 
enough spatial reach to guide the optimization in the correct 
direction. While this problem does become much less notice¬ 
able when larger image sets are used, rather than the minimal 
set comprising two images, it is nonetheless beneficial for this 
potential pitfall to be avoided altogether. We achieve this by 
approaching the problem in a coarse-to-fine fashion. Specifi¬ 
cally, observe that by varying the bandwidth of the high-pass 
filter used to extract our ABS-HP representation, it is pos¬ 
sible to trade-off the breadth of spatial influence of a filter 
response, and its localization power. Thus, we start the reg¬ 
istration process by using a wide-band high-pass filter, and 
follow that by progressively narrower band filters as conver¬ 
gence at each level is detected. The particular filters we used 
in our experiments have the values of 40, 20, 8, and 3 pixels 
for the parameter a in (|7]). 

3 Evaluation 

In this section we describe our evaluation of the proposed 
method, and report and discuss its performance in the context 
of the current state-of-the-art. We begin by describing our 
data set and evaluation protocol, follow with a presentation 
of a comprehensive set of performance statistics, and finish 
off with an analysis of our results and their significance. 

3.1 Data 

As reviewed in detail in Sec the existing literature on the 
registration of aerial imagery is void of any set-based ap¬ 
proaches, the present paper being the pioneering work in the 
area. It is an unsurprising consequence of this that there are 
no public data sets suitable for the evaluation of set-based al¬ 
gorithms so we collected a novel data set ourselves. 

Our data set comprises 10 image sets, each set contain¬ 
ing 10 images acquired at different, non-uniformly distributed 
dates, as illustrated in Fig|^ This data was manually down¬ 
loaded using the freely accessible web portal provided by 
Nearmap Ltd. Nearmap has developed technology for rapid 
acquisition of high resolution aerial imagery, which allows 
frequent re-imaging of large areas. The most time demanding 
task in their pipeline concerns the registration and stitching 
of aerial images (tiles) to form a continuous representation. 
Registration is performed using a combination of manual in¬ 
put and state-of-the-art commercial software; thus, different 
images within the same image set in our data correspond to 
approximately the same land area. Both GPS and image data 
are used in the registration process, the latter being based on 
a robust alignment of local features (the exact details of the 
registration algorithm are proprietary and as such were not 
disclosed to us in entirety). 

3.2 Protocol 

We first obtained an estimate of the ground truth by manually 
labelling images. Specifically, the optimal registration pa¬ 
rameters were estimated from correspondences between se¬ 
lected characteristic image loci. Loci physically lying on the 
ground plane were consistently chosen to avoid the problem 
of varying viewpoint from which images are obtained, and 
which can significantly change the perspective from which 
different surfaces are seen (e.g. house walls or roofs). 
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Figure 5: The distribution of acquisition dates for a typical image 
set in our data. It can be readily seen that the acquisition was not per¬ 
formed at regular intervals: in some instances images are re-acquired 
after only a month, while in others several months pass. 


Table 1: Baseline performance (all statistics are in pixels). 


Method 

Nearmap 

ARRSI 

SURF 

Mean error 

18.0 

248.5 

140.9 

Deviation 
(between sets) 

0.75 

77.2 

110.9 

Deviation 

(within a set, mean) 

24.6 

139.8 

266.9 


After obtaining an estimate of the ground truth, we con¬ 
ducted three baseline experiments. Firstly, we assessed the 
quality of registration performed by Nearmap. In addition, we 
evaluated two popular r egistration approy hes from the liter¬ 
ature: (i) using SU RF I Bay et al, 2008| feature correspon¬ 
dences (similar to |Arandjelovic, 20 12d| using SIFT), and 


(ii) th e state-of-the-art ARRSI algorithm I Wong and Clausi, 
20071 , specifically tailored to aerial images, which is also 


sparse in nature and uses phase congruency-based control 
points. Feature matching in both cases is performed robustly 
using RANSAC. 


3.3 Results and discussion 

Using our ground truth labelling, we were able to estimate 
that the average registration error of the proprietary method 
employed by Nearmap is approximately 18 pixels. Much like 
most of Nearmap’s imagery, our data was acquired in the res¬ 
olution for which one pixel width corresponds to 7.5 cm on 
the ground. Thus, the average misalignment of two images 
considered to show the same patch of land by Nearmap is 
about 1.35 m. This error is more than sufficiently large to 
limit the powers of subsequent processing for image under¬ 
standing; to give an example, this may be the detection of per¬ 
manent structural change (e.g. solar panel installation, well 
drilling etc), which is of major interest to local councils and 
governments. It is insightful to notice that while the standard 
deviation of the mean misalignment error between sets was 
found to be very small indeed (0.75 pixels, or 5.6 cm on the 
ground), the mean deviation within a set was far larger (24.6 
pixels, or 1.85 m); please see Table[^ This strongly supports 
one of the premises of our work: that the primary challenge is 
not posed by the content of the aerial scene (i.e. the structures 
in it), but rather the changes that the scene exhibits over time 
(illumination being the most substantial one). 

We next turn our attention to the two baseline methods 
from the literature. As the summary in Tableclearly shows, 
both of these performed very poorly on our data. Not only 
did neither of the methods manage to improve on the original 




























registration by Nearmap, both of them increased the average 
registration error by approximately an order of magnitude. 
While perhaps surprising at first, this finding is readily ex¬ 
plained following a more in-depth examination of the results. 
Specifically, in both cases observe the extremely large devi¬ 
ation of the misalignment error within an image set - while 
in the case of some image pairs registration was highly suc¬ 
cessful (error of ~ 4 pixels), in other cases unreliable inter¬ 
est point correspondences resulted in grossly inaccurate re¬ 
sults (errors in excess of 400 pixels). Indeed, as stated in 
Sec \23\ this is consistent with our experiments using a va¬ 
riety of interest point descriptors - while highly successful 
in the registration of images acquired in similar illumination 
conditions, even in the presence of different small transient 
objects, sparse feature-based methods exhibit a dramatic drop 
in performance as illumination conditions change. We found 
them to lack sufficient robustness to deal with the challenges 
in real-world images such as those in our data set. 

Lastly, we present the results obtained using the proposed 
method. For all sets we found that our method substantially 
decreased Nearmap’s registration error. On average, the re¬ 
duction was 75.8%, resulting in the average absolute image 
error of only 4.4 pixels, or 32.7 cm on the ground. A plot 
detailing the performance of the method is shown in Fig |6(a)| 
While our method too exhibited some variation across dif¬ 
ferent sets, even in the worst case (number 6) the error was 
reduced by over 60%. This demonstrates the achievement of 
our first goal of developing both a more accurate, and a more 
robust registration method, than the state-of-the-art used com¬ 
mercially, or indeed described in the academic literature. 



(a) 



(b) 


Table 2: Computational cost comparison. ARRSI and the SURF- 
based methods were implemented primarily in C, with a Matlab 
‘wrapper’, while the proposed method was implemented fully in 
Matlab. The estimates are averages of 100 executions ran in Mat- 
lab 7 on an AMD Phenom IIX4 965 processor with 8GB RAM. 


Method 

Proposed 

ARRSI 

SURF 

Set size 

10 5 3 

n/a 

n/a 

Registration time 
(s per image) 

5.6 3.8 3.9 

17.5 

22.5 


the best case it is more than nine-fold (set number 4). 

The variation in the benefit - that is, the reduction in the 
registration error - across sets corresponding to different land 
areas obtained by using 10 as opposed to 2 images led us to 
investigate this specific aspect of our results in greater detail. 
By examining the variation in the average registration error as 
the size of an image set is gr aduall y increased we noticed that 
the variation observed in Fig |6(b)| primarily emerges as a con¬ 
sequence of the variation in the registration erro r obtained for 
the minimal set size (i.e. pair-wise registration) jArandJelovic 
et al ., 2m5) . Already for sets of size 3 the variation (absolute, 
as well as relative) across different sets is much reduced. This 
finding too strongly supports the premise that the constraints 
which can be extracted from the use of more than two images 
are a powerful source of information which can be harnessed 
to increase the accuracy and reliability of registration in the 
presence of large appearance changes. 

Lastly, a summary of the computational cost statistics is 
given in Table As expected, the average time for registra¬ 
tion per image achieved using the proposed method increases 
somewhat with the increase in the set size due to the greater 
complexity of the constraints graph. Nonetheless, in all cases 
our method’s computational cost was significantly lower than 
that of either of the state-of-the-art methods. 


4 Summary and conclusions 


Figure 6: (a) Mean pair-wise registration error (blue bars) per im¬ 
age set obtained with the proposed method, relative to the initial 
error. In all cases the error reduction is dramatic, averaging 75.8%. 
The red stems extending from the blue bars show the standard de¬ 
viation of the relative registration error within each of the 10 image 
sets, (b) The reduction in the mean registration error per image set 
using 10-image sets relative to 2-image sets. In all cases the error is 
reduced at least to half, and 3.2 times on average. 


A major premise of our work, and a methodological nov¬ 
elty, pertaining to the joint co-registration of aerial image sets 
rat her th an image pairs is substantiated by the data shown in 
Fig 6(b) This plot compares the reduction in the mean regis¬ 
tration error (relative to Nearmap’s baseline error) when our 
method is applied to minimal 2-image sets (i.e. subsets of the 
original sets), and when all the available data (10 images) is 
used instead. Indeed, it can be readily seen that the error is 
consistently reduced in the case of all sets. Even in worst 
case (set number 1) the reduction is over two-fold, while in 


In this paper we introduced a novel method for the registra¬ 
tion of aerial images. Unlike previous work which consid¬ 
ered either pair-wise image-to-image or image-to-map reg¬ 
istration, our approach jointly registers an entire set of im¬ 
ages of approximately the same area but acquired at different 
times. We formulated the joint registration problem as an op¬ 
timization scheme. Set-based registration was built upon sim¬ 
ple pair-wise registrations which are mutually constrained by 
means of a connectivity graph. We showed how this graph 
can be constructed automatically, using a combination of two 
rules, one proximal, the other distal in the Euclidean image 
space. Using a novel data set suitable for the evaluation of set- 
based aerial image registration algorithms, we demonstrated 
that the proposed approach significantly outperforms the cur¬ 
rent state-of-the-art both in terms of accuracy and reliability, 
as well as speed. Amongst other possible directions for im¬ 
provement, ourfuturewo^_^^ investigate the use of colour 
invariants jArandjelovic, 2012^ . 
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